Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add keep_column(s) params to to_dummies #14844

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Mar 4, 2024

Resolves #14831. Simple boolean parameter that provides retention of columns. Default is False to preserve current usage.

>>> import polars as pl
>>> pl.Series("a", [1, 2, 3]).to_dummies(keep_column=True)
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ a_1 ┆ a_2 ┆ a_3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ u8  ┆ u8  ┆ u8  │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   ┆ 0   │
│ 2   ┆ 0   ┆ 1   ┆ 0   │
│ 3   ┆ 0   ┆ 0   ┆ 1   │
└─────┴─────┴─────┴─────┘

>>> pl.DataFrame({"a": [1, 2], "b": [3, 4]}).to_dummies(keep_columns=True)
shape: (2, 6)
┌─────┬─────┬─────┬─────┬─────┬─────┐
│ a   ┆ a_1 ┆ a_2 ┆ b   ┆ b_3 ┆ b_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ u8  ┆ u8  ┆ i64 ┆ u8  ┆ u8  │
╞═════╪═════╪═════╪═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 0   ┆ 3   ┆ 1   ┆ 0   │
│ 2   ┆ 0   ┆ 1   ┆ 4   ┆ 0   ┆ 1   │
└─────┴─────┴─────┴─────┴─────┴─────┘

@mcrumiller mcrumiller changed the title feat(python, rust): add keep-column[s] params to to_dummies feat(python, rust): add keep_column[s] params to to_dummies Mar 4, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 4, 2024
@mcrumiller mcrumiller marked this pull request as ready for review March 4, 2024 21:38
@s-banach
Copy link
Contributor

s-banach commented Mar 4, 2024

What happens if one of the new dummy columns has the same name as the original column?

@mcrumiller
Copy link
Contributor Author

mcrumiller commented Mar 4, 2024

What happens if one of the new dummy columns has the same name as the original column?

That would have to be very contrived. The dummy columns are named by [original_name]_[value]. One would have to have explicitly already had a column with that name, and that issue is present on the existing implementation anyway:

import polars as pl

df = pl.DataFrame({
    "a": [1, 2, 3],
    "a_1": [1, 2, 3],
})
df.to_dummies("a")
polars.exceptions.DuplicateError: unable to hstack, column with name "a_1" already exists

My guess is that one can contrive many scenarios to collide with polars' renaming, but it's not in our best interest to fight those edge cases.

@s-banach
Copy link
Contributor

s-banach commented Mar 4, 2024

As long as it raises the same error.

Copy link

codecov bot commented Mar 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.33%. Comparing base (6ccb187) to head (2d6e17f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14844   +/-   ##
=======================================
  Coverage   79.33%   79.33%           
=======================================
  Files        1548     1548           
  Lines      214245   214262   +17     
  Branches     2460     2460           
=======================================
+ Hits       169968   169984   +16     
- Misses      43719    43720    +1     
  Partials      558      558           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller mcrumiller changed the title feat(python, rust): add keep_column[s] params to to_dummies feat: add keep_column[s] params to to_dummies Mar 12, 2024
@mcrumiller mcrumiller requested a review from reswqa as a code owner April 8, 2024 16:45
@mcrumiller mcrumiller force-pushed the dummies-keep-column branch 3 times, most recently from 8fa2a87 to a9cd649 Compare November 17, 2024 21:52
@mcrumiller mcrumiller changed the title feat: add keep_column[s] params to to_dummies feat: Add keep_column(s) params to to_dummies Nov 18, 2024
@MarcoGorelli
Copy link
Collaborator

hey - isn't the solution in #14831 (comment) good enough? perhaps that just needs documenting?

@mcrumiller
Copy link
Contributor Author

It was a fairly simple implementation so no harm if you don't want to accept it--it felt like a reasonable parameter to me. Should I close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add keep_column parameter to DataFrame.to_dummies
3 participants